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ABSTRACT 


The members of a NATO Human Factors and Medicine (HFM) Panel Research Task Group (RTG) spent 
the last three years examining issues related to human effectiveness within embedded virtual simulations 
(EVS) for training. EVS is an enabling technology that provides an interface to interactive simulations 
that reside within or are appended to the operational equipment. It can provide links to local and/or 
geographically distant trainees and instructional resources. EVS enables a full range of capabilities that 
support training and development of individual and team knowledge and skills. The RTG evaluated four 
primary topics associated with EVS: military requirements, training management, human interaction and 
the utility of intelligent agents within embedded virtual training environments. This paper presents the 
group’s evaluation of the human interface to EVS and the impact of human factors on learning and 
applications of EVS. 


1. INTRODUCTION 


The purpose of embedded training (ET) is to make use of operational equipment so that operators can train 
effectively. The implementation of ET requires fundamental capabilities, including the presentation of 
scenarios, performance assessment, feedback and management [1]. Although ET is not a new concept, 
recent and emerging technologies provide opportunities for enhancing its use and efficacy. The enabling 
technologies have grown in number and capability and they are potentially more cost effective than ever. 
Among these are technologies for interfacing the human operator with a virtual environment. For example, 
augmented or mixed reality can combine visual images of the real world, including prime equipment, and 
other people, real or synthetic, with computer-generated environments. 


In October 2009, a Research and Technology Group (RTG) of the Human Factors and Medicine Panel 
(HFM) of NATO conducted a workshop in Orlando Florida on the Human Dimensions of Embedded 
Virtual Simulation (EVS). The overall goals of the workshop were to identify gaps and to find potential 
solutions that will allow effective implementation of EVS. 


The workshop consisted of four sessions that examined the current issues and practices with EVS. These 
included policy and user requirements, human effectiveness, human interaction, and learning technologies. 
This paper considers human interaction with EVS, which is a means of interfacing trainees to ET 
scenarios. Sensory displays and controls were considered as components of the human interface. Three 
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papers were presented followed by a mind mapping exercise that took place within three groups. The 
facilitators asked each group to consider the following question as a means of stimulating discussion: 
“What are the capabilities and limitations of EVS interface technology?” The following section of this 
report summarizes the presentations and outcomes of the discussions. 


2. SESSION PRESENTATIONS: 
2.1 Human-Robot Interaction 


Neuhoefer, Kausch and Schlick [2] addressed the need for human-in-the-loop, immersive simulation 
technologies as a means of accelerating safety assessments in the control of industrial robots. They noted 
that optical, see-through head-mounted displays (HMDs) provide an attractive solution since they have a 
small footprint, are less expensive than alternatives visual display systems (e.g., CAVES) and are very 
immersive, but that the size, weight, and resolution of see-through HMD are problems for usability. An 
additional problem is that it is difficult to make use of see-through HMDs if virtual objects need to 
occlude real objects in the visual scene. Development efforts to address this problem were described; they 
make use of addressable focal planes for combining images of real and virtual objects. The industrial 
application involved handling and cleaning heavy parts with a blasting tool that shot pellets. Neuhoefer, 
Kausch and Schlick determined that visual depth, haptic, and auditory cues were important for the 
simulation. They compared two implementations of the human interface with a stereoscopic HMD. In an 
augmented reality (AR) condition, the operators were able to see their hand and the real tool. In a virtual 
reality (VR) condition only computer-generated images were provided. The performances of 40 users, and 
the workloads that they experienced, were measured for each condition. Significantly more virtual pellets 
were used to perform the task with augmented reality. The participants reported that more pellets were 
needed because their aim was affected by mismatch in the orientation of the pellets and the blasting tool. 
No significant difference in any measures of workload, using the NASA-TLX questionnaire, was found. In 
comparison, the participants preferred the interface that used augmented reality as a potential solution 
because they could immediately see their hand move. 


2.2 Spatial Perception 


Sandor, Hartnagel, Bringoux, Bourdin, Godfroy and Roumes [3] also considered the advantages and 
disadvantages of HMDs. They noted the benefit of HMDs for conveying three-dimensional information 
and the problems of a reduced field-of-view and a head-fixed, visual frame. They were concerned that 
these characteristics would negatively affect visual orientation, particularly the perception of the direction 
of gravity. They first conducted a study to determine if a tilted, large-scale, immersive, virtual 
environment would affect judgments of verticality in the same way as tilted real environments. Variations 
of the Rod and Frame Test (RFT) were used. Three different virtual environments were presented: one 
replicated the traditional RFT, one provided wall paper on the walls, and another added furniture and 
objects to enhance depth cues. In addition, two methods were used to adjust the rod. One required use of a 
computer mouse to adjust the vertical orientation of the rod; the other used a real, hand-held rod. A typical 
sinusoidal relationship between the amount of frame tilt and rod tilt was found for the real and the virtual 
environments. Within the virtual environments, the amplitude of the sine wave grew with scene detail. In 
turn, these functions were affected by the adjustment method. The condition that allowed the participants 
to hold the rod reduced misperception of the gravitational direction. Hence, there is evidence that haptic 
control and feedback reduce the misleading effects of greater scene detail. This evidence provides an 
argument for multimodal cuing in ET applications involving spatial orientation. 


Sandor et al [3] also reported the results of a second study that investigated the spatial relationship 
between visual and auditory cues. In a dark room, a spot of light and a sound were presented at the same 
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time, at the same location or in different locations, on a flat, frontal plane. The participants judged whether 
a light and sound were fused, that is, whether or not they appeared to be coming from the same position in 
space. The limits for fusion were found to be greater in height than in width (at eye level) and 
progressively greater for stimuli as they moved away from straight ahead. To determine if gaze direction 
affects fusion, the investigators required the participants to make judgments when gaze and head 
orientation were aligned and misaligned. They found that the fusion limits were influenced by both gaze 
direction and head orientation in an equal manner. These results, obtained in a darkened room, replicate 
results in a lighted room where the other visual cues for orientation are available, and they show that 
fusion accuracy varies with the relative alignment of the eyes and ears. 


2.3 Dismounted Soldier Requirements 


A third presentation in the workshop session on human interfaces was made by Dyer [4] who was 
concerned with the impacts that embedded training could have on soldier systems. Dyer noted that US 
Army policy states that embedded training should not adversely impact a system’s operational capability 
and that embedded training needs to provide systems-related training and feedback. Dyer also noted that 
size, weight and power are critical considerations affecting feasibility of embedded training for soldier 
systems, especially for fully embedded, go-to-war capability. Dyer described the requirements for 
embedded training for the Ground Soldier System (GSS) and the investigation of alternative architectures 
for embedded training, which included virtual training in a facility and a “stand-alone” mode. The 
principal components of the GSS are a wearable computer and a GPS coupled to a helmet-mounted 
display system. The conclusion was that only a few tasks were amenable to virtual simulation and that an 
approach for identifying the tasks and skills that are most appropriate for embedded training with the GSS 
was needed. 


Dyer described a funneling approach for identifying the tasks; it first considered psychological 
dimensions, such as the memory decay for the task, and task characteristics, for example, its frequency. A 
second step considered military criteria that consisted of questions focused on sustainment, rather than 
initial skills training. Dyer described some tasks to illustrate the use of the questions to identify tasks 
appropriate for embedded training, but acknowledged that the process has not been validated. Dyer 
advanced the notion that memory aids might be a better and more acceptable means of maintaining skills 
than virtual exercises. Dyer provided a few examples of the challenges that embedded virtual 
environments need to address for dismounted soldiers. One example of the challenges is that many cues 
and signals are used by a fire team getting ready to clear a room. Touch and gesture are used by the team 
members. Hearing is used to locate hostiles and non-combatants. The soldier’s acceptance of EVS as a 
means of training and rehearsal was identified as a problem by Dyer, who noted that soldiers gain 
confidence by practicing on ground similar to the operational setting and that their preference is for live 
training. 


3. DISCUSSION 
3.1 In-Flight Training 


The presentations within the session helped to inform and stimulate the group discussions that followed as 
each group considered the limitations and capabilities of EVS as an enabling technology for embedded 
training. The discussions were also informed by the preceding addresses of the keynote speakers and 
several presentations that were also relevant to issues of the human interface. Among these was the 
keynote of Verhaaf [5] who provided a user’s perspective on the need to maintain high proficiency for air 
combat while deployed. He described successful exploratory use of EVS by the Royal Netherlands Air 
Force (RNLAF) Command. It was used for training F-16 combat skills as a precursor for application to the 
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F-35 Joint Strike Fighter. Simulated threats were fed to the sensor systems of the aircraft in flight, thus 
allowing several pilots to engage virtual targets in tactical manoeuvres Beyond Visual Range (BVR). 


The virtual world included the threats and targets, their electronic signatures, weapons and dynamic 
behaviours, involving strategies, tactics, manoeuvres and counter measures corresponding with their real- 
world roles This achievement demonstrates that it is technically feasible to include BVR engagements in 
an EVS training exercise. However, simulated engagements Within Visual Range (WVR), that is within 
about 10 nautical miles (depending on visual conditions) provide major technical challenges since they 
require the overlay of virtual objects upon the real world as viewed by the pilot. The visual requirements 
for training target identification exceed the capabilities of state-of-the-art HMDs; pixel resolution and 
(colour) contrast are now inadequate for visual target identification (VID). HMDs or suitable alternatives 
do for displaying realistic, virtual opponents in an EVS do not yet exist [13]. Roessingh, van Sijll & 
Johnson [13] identify 12 visual requirements for WVR embedded simulations. These include the 
following: 


1. An image update rate and a display refresh rate of at least 80 Hz, but possibly much more than 80 
Hz. 


2. The time delay of the WVR-target visualisation system should be less than 20 milliseconds. In 
other words, the position in the display of simulated targets should not lag more than 20 ms 
behind on simulated target motions, actual aircraft motions, and actual head motions. 


3. A field size (Field of Regard) of the simulated scene in the range of 300 degrees horizontally and 
150 degrees vertically. 


4. A field depth in the range of 10 to 18.5 km. 


5. Scene management must be based on point-of-gaze measurement. Specification of the type of 
head- and eye- movements that must be measured would need to be determined and depend on 
other requirements. 


6. Occlusion of virtual targets by real world objects must be managed, that is, the hiding of virtual 
targets or part of a virtual target behind real world objects which are at closer distance to the 
observer, such as clouds, mountains and aircraft structures, must be managed. For safety reasons 
the operational community may be concerned about the occlusion of a real target by a virtual 
target, since a mid-air collision could be possible. 


7. Shading and illumination of virtual targets by real world objects (sun and clouds) must be 
managed. 


8. Quickly varying luminance levels, ranging from a few feet-Lambert to several hundreds feet- 
Lambert, in a sufficient number of luminance steps, should be supported by the display system. 


9. A resolution of the foveal image of 86 pixels per degree visual angle (in both horizontal and 
vertical direction) should be supported. However, a decreasingly lower peripheral resolution 
would be needed. 


10. A bi-ocular display with hundred percent overlap between the field of view of each eye is 
required. When targets come within a range that is less than 160 meters, support of stereopsis 
must be considered. The latter implies a binocular display with different images for each eye 
corresponding with retinal disparity. 
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11. The display method should avoid potential conflicts between accommodation and other monocular 
depth cues if simulated targets are displayed at a short observation distance. 


12. The mechanical and optical properties of a device that enables projection of virtual targets 
superimposed on the real world should not compromise pilot safety (e.g. in high-g manoeuvres, in 
ejection or in a crash). 


Roessingh, van Sijll & Johnson note that when only a subset of fighter aircraft WVR tasks need to be 
trained with ET, some of these requirements could be relaxed. Roessingh & Verhaaf [6] later provided 
evidence of the positive training effectiveness of the approach and discussed how EVS addresses user 
needs. They identified the visual display of simulated entities within visual range as a current, 
technological limitation and thus conclude that EVS is now applicable only to training scenarios that do 
not depend on visual sight of the threats or friendly forces. However, when compared to full mission 
simulation, EVS in the air provides important sensory cues that are expensive, or difficult to provide on 
the ground with full fidelity. These include accurate physical motion, control loading (force reflection) and 
aerodynamic cues. 


Since displays (e.g. for radar and radar warning receiver) in the cockpit can contain both real and virtual 
information at the same time, operators should always be aware which information is real and which is 
virtual. This helps them to make the appropriate trade-offs and decisions during the training scenario. It is 
clearly not desirable to perform a potentially unsafe manoeuvre or action in response to a virtual entity, 
while this may be totally justified in an operational situation. A potential implementation for symbols on a 
display is to give the virtual entities a dedicated supplementary tag. In the fighter aircraft EVS that was 
used by the RNLAF, this was accomplished by attaching a small “v” to each virtual symbol on all displays 
where they ccould appear. Naturally such information should be designed carefully in order to guarantee 
positive transfer of training. Another effective strategy in the design of displays is to give symbols related 
to real entities a higher display priority. This way, symbols related to virtual entities do never obscure 
those related to real entities. More advanced means are also possible. Automatic monitoring of aircraft and 
its interaction with the real environment can prevent unsafe situations. This can be accomplished by the 
continuous evaluation of a number of safety rules by the EVS itself. The simulation immediately stops 
when one of the rules is violated, that is, when it detects an unsafe situation. Naturally this should be 
properly announced to the operators. As an example, the system can monitor that an aircraft remains in a 
temporary reserved airspace during the training. It could also automatically detect potential collisions with 
real entities or real terrain.In an overview of the use of embedded training for the F-35, Bills, Flachsbart, 
Kern & Olsen [7] emphasized safety of flight as a consideration within and following a TRAIN mode. 


3.2 Ground Vehicles 


Schmidt [8] provided a keynote address that presented an EVS solution for the Infantry Fighting Vehicle 
(FV), and Shiflett [9] provided a keynote that presented embedded training as key performance parameter 
for the Future Combat System. Schmidt described the use of head-mounted displays with the IFV, and 
Magee [10] cited the development of special vision blocks by the U.S. Army Research, Development and 
Engineering Command (RDECOM) as a development that enables armoured crew the ability to observe 
computer-generated imagery of the external world. These examples indicate that embedded visual system 
technologies are technically ready for use with ground vehicles, but Schmidt also cautioned that safety 
concerns would likely constrain the movement of ground vehicles and thus the use and effectiveness of 
EVS if physical motion cues are needed for human learning. 


3.3 Dismounted Soldiers 
The act of embedding training simulations on dismounted soldiers may be one of the most challenging 


operational environments in which to implement EVS. The ability to stimulate the soldier’s senses and to 
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respond to his touch is not only limited by the simulation, but also by constraints that include power, 
weight, processing power and communication bandwidth. Power consumption is already a major concern 
in the world’s military. Battery power is not yet efficient enough to support dismounted soldiers over long 
periods of time. Any proposed EVS solutions would have to be fully embedded and not add to the already 
heavy load carried by dismounted soldiers. These fully embedded solutions will likely tax the processing 
power, communications bandwidth and battery power of the operational systems they are contained 
within. 


Soldiers in the modern military move, shoot and communicate, but they also are expected to act as sensors 
in the environment and to negotiate with indigenous people. An EVS for dismounted soldiers must be 
able to present the visual world in sufficient fidelity to allow them to train in an environment that recreates 
the operational environment. Soldiers need to detect the movement of threats at a distance. They need to 
be able to identify potential targets as friend or foe, and to interact with other avatars and non-player 
characters to allow recognition of emotions and body language during negotiation tasks. 


In Schmidt’s [8] keynote address, he presented an EVS solution for the Infantry Fighting Vehicle that 
included a dismounted capability. This was limited in that it was tethered to the vehicle and it provided a 
fully virtual training environment. This dismounted EVS did provide an effective power solution in that it 
drew power from the Puma vehicle. However, it did not incorporate the live environment as part of the 
EVS which Dyer [4] has suggested is critical to effective training. A more effective, but also more 
complex and costly alternative to all virtual or all live is mixed reality. The near term potential exists to 
implement virtual targets. 


3.4 Observations and Opinions 
The following general observations and common opinions were reported by the groups: 

e New EVS technologies, such as augmented reality, create opportunities for expanding the 
usefulness and range of applications of embedded training. The applications that were identified 
included air, ground and sea systems involving vehicles, command and control centres, and 
dismounted combatants. 

e EVS can be a cost-effective alternative to more traditional simulator-based training for vehicle- 
based tasks and command centers, but is not technically ready for training dismounted 
combatants. 

e Safety issues associated with vehicle use can constrain the application of EVS; there was much 
concern about the interplay between training and operational modes, and the movement of 


operational equipment in a ET mode. 


e There was a general concern about the need for realism, to allow soldiers to train as they fight. 
Many human factors were identified including the following: 


o sensory cue fidelity 
o multi-modal interactions 
o sensitivity to interactions between simulated and real worlds, 


o sensitivity to interactions among multiple players in team training applications, 
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o consequences for human performance in the EVS and subsequently for training transfer 
o human adaptation 
o unwanted side-effects, e.g., simulator-induced sickness 


e EVS technologies were considered to be the most advanced for vision, but limitations were 
identified for night vision simulation systems, depth cuing, and peripheral vision (i.e., wide field 
of view). The participants thought that flexible, transparent displays may soon provide a solution 
to the problem of presenting computer generated targets within visual range for in-flight fighter 
training. 


e For hearing, sound effects and three-dimensional rendering are thought to be well developed, but 
limitations exist for simulating the location of a sound source in three-dimensional space, 
especially if head-phones are not used, and wide variation of abilities among humans was noted. 


e For speech recognition, there is much concern about the masking effects of the often noisy 
environments of ET, although success has been achieved (e.g., for the F-35). 


e For motion cuing, the physical stimuli can be real and uncompromised, e.g., as EVS was used for 
in-flight training, or completely absent, e.g., as EVS was used for the infantry fighting vehicles or 
tanks that do not move because it would be unsafe to operate them in a training mode. 
Consideration was given to the use of motion seats as a means of providing physical motion cues 
in vehicles that are stationary, but the costs and bulk of these systems was thought to be 
prohibitive and of doubtful utility since much of the literature on motion cuing says that it is not 
necessary for effective training transfer, for most tasks. Tactile vests and tactile belts are also a 
possible, more affordable alternative, but their effectiveness is unknown. 


e For haptic and vibration cuing, like motion cuing, the physical stimuli can be real and 
uncompromised, as EVS was used for the in-flight training demonstration for the F-16, or it can 
be absent or seriously compromised, in the way that EVS was implemented as a virtual reality 
(VR) experience for robot control. Consideration was given to the use of force loaders in a 
training mode for vehicles, but the costs and bulk of these systems was again thought to be 
prohibitive. 


e For smell, odours can be simulated, but are difficult to remove from the training environment. 
Emergency medicine was one area where the user’s needs might justify the use of a chemical 
simulator, but few other tasks were thought to need this type of sensory cue for effective training. 


e Inter and intra modal consistency was recognized as a challenge. For instance, the relationship 
between simulated images, provided to a HMD with head-movements, and the real world present 
a challenge for head-tracking technology. The sensitivity of human operators to conflicting 
information within or between sensory systems was identified as human factor that could lead to 
unwanted side-effects, such as simulator-induced sickness. 


e Additional challenges exist in developing and interacting with intelligent agents (e.g., virtual 
humans and intelligent tutoring systems) within EVS. The inherent remoteness of EVS 
(especially in deployed settings) poses a challenging environment for instruction and feedback to 
trainees. It is impractical to envision EVS in the future where human instructors/tutors would be 
available to support either individual or collective training on the scale required. It is more likely 
that tutors will be intelligent agents with adaptive algorithms to provide tailored training to both 
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individuals and teams in remote locations. 


4. CONCLUSION AND RECOMMENDATIONS FOR FUTURE RESEARCH 


The workshop revealed several successful applications that provided effective human interfaces to EVS 
systems. These included interfaces for robot control, fighter training, and combat ground vehicles. In 
comparison, effective human interfaces for dismounted combatants were not revealed. For all types of 
systems, concerns about weight, size, and safety, as well as technological constraints were found to restrict 
the design and use of the human interface to an EVS system. For example, the cues of physical motion are 
limited with embedded training in a tank because it could be unsafe if movements of the vehicle are 
unrestricted during an ET exercise. In an ideal EVS implementation, the user will not be able to tell the 
difference between the real and simulated environment. Thus, it will be unclear to the user if a failure is 
real or simulated. Unlike traditional simulation, EVS needs to include a method to remind the soldier 
about its mode. 


Human perception is multimodal. The psychophysical study [3] of the visual perception of gravitational 
direction, its susceptibility to scene detail and haptic input, and the psychophysical study [3] of the fusion 
of light and sound in space, and its susceptibility to head and eye orientation, illustrate the interplay within 
and between the senses in determining our perception of the environment and our place in it. The 
workshop participants concluded that the relationship between training and real environments, that is, the 
fidelity of the EVS and the consequences for training transfer, remain a concern due to technological 
limitations and a lack of behavioural information about the efficacy of EVS. 


Many weapon platforms are operated by teams, and platforms operate with other platforms in many 
missions. Obviously, team training is an important area of application for EVS. Human Interface 
requirements need to allow co-ordination among team members and platforms. This may require dedicated 
communication channels. The use of EVS for team training could promote unity in operational procedures 
and doctrines and help train effective communication techniques. Training scenarios could, inter alia, be 
based on actual battlefield incidents involving factors related to teamwork. 


As discussed above, artificially intelligent agents are more likely to be deployed with operational 
equipment to support EVS in the future. It is envisioned that part of the human interaction problem space 
for making intelligent tutoring systems and virtual humans practical for EVS will include low cost, 
unobtrusive methods for sensing behaviors (e.g., actions, gestures) and physiology (e.g., heart rate and 
galvanic skin response). Behaviors and physiology (observable trainee states) will then be used to predict 
cognitive states (e.g. unobserved trainee states). Predicted cognitive states that are relevant to learning 
could include affective variables like frustration and confusion or others like attention and engagement. 
Methods to accurately predict these cognitive states will determine the adaptability of any computer-based 
tutor and either limit or enhance the trainee’s perception of the tutor’s persona, credibility and 
supportiveness. In other words, the tutor’s effectiveness (in terms of learning) is likely to be limited by 
the acceptance of the artificially intelligent tutor and acceptance (or lack thereof) will be limited the tutor’s 
ability to predict the state of the trainee at least as effectively as a human tutor. 


The costs of simulation have long been hypothesized to grow geometrically with fidelity while training 
transfer has been hypothesized to increase as the area under the curve of a normal distribution would 
accumulate with increases in fidelity [11]. This is the problem facing decision-makers who must decide 
how much to spend on a simulator, or how much to include in an EVS. The plots of these relationships 
could be very useful if they were based on actual data since they could be used to identify the amount of 
fidelity that yields the most training value for cost. However, there is no objective method for measuring 
the overall fidelity of a training device or EVS system. There are few studies and many different measures 
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of training transfer. Thus, there is a need to develop measures and to determine their relationships. In 
addition, subjective opinion, human adaptation and simulator-induced sickness are outcomes that further 
complicate our understanding of the human factors associated with the design, use and evaluation of EVS. 
On this basis, a number of the workshop participants thought that EVS should be considered as an 
extension to traditional training methods and is not yet their replacement [12]. 
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